137 research outputs found

    Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes

    Get PDF
    The ongoing hardware evolution exhibits an escalation in the number, as well as in the heterogeneity, of computing resources. The pressure to maintain reasonable levels of performance and portability forces application developers to leave the traditional programming paradigms and explore alternative solutions. PaStiX is a parallel sparse direct solver, based on a dynamic scheduler for modern hierarchical manycore architectures. In this paper, we study the benefits and limits of replacing the highly specialized internal scheduler of the PaStiX solver with two generic runtime systems: PaRSEC and StarPU. The tasks graph of the factorization step is made available to the two runtimes, providing them the opportunity to process and optimize its traversal in order to maximize the algorithm efficiency for the targeted hardware platform. A comparative study of the performance of the PaStiX solver on top of its native internal scheduler, PaRSEC, and StarPU frameworks, on different execution environments, is performed. The analysis highlights that these generic task-based runtimes achieve comparable results to the application-optimized embedded scheduler on homogeneous platforms. Furthermore, they are able to significantly speed up the solver on heterogeneous environments by taking advantage of the accelerators while hiding the complexity of their efficient manipulation from the programmer.Comment: Heterogeneity in Computing Workshop (2014

    From hybrid architectures to hybrid solvers

    Get PDF
    International audienceSolving large sparse systems of linear equations is a crucial and time-consuming step, arising in many scientific and engineering applications. Consequently, many parallel techniques for sparse matrix solution have been studied, designed and implemented based on factorization or hybrid iterative-direct approaches. In this context, graph partitioning and nested dissection ideas have played a crucial role. The main goal of this presentation will be to give an overview of the continuum between these various algorithmic approaches and to present the improvements of the algorithms and of the associated parallel implementations in a manycore context. Numerical experiments on large irregular real-life problems will illustrate this work

    Memory Optimization to Build a Schur Complement in an Hybrid Solver

    Get PDF
    Solving linear system Ax=bAx=b in parallel where AA is a large sparse matrix is a very recurrent problem in numerical simulations. One of the state-of-the-art most promising algorithm is the hybrid method based on domain decomposition and Schur complement. In this method, a direct solver is used as a subroutine on each subdomain matrix. This approach is subject to serious memory overhead. In this paper, we investigate new techniques to reduce memory consumption during the build of the Schur complement by a direct solver. Our method allows memory peak reduction from 10% to 30% on each processus for typical test cases

    A NUMA Aware Scheduler for a Parallel Sparse Direct Solver

    Get PDF
    International audienceOver the past few years, parallel sparse direct solvers have made significant progress. They are now able to solve efficiently real-life three-dimensional problems with several millions of equations. Nevertheless, the need of a large amount of memory is often a bottleneck in these methods. The authors have proposed an hybrid MPI-thread implementation of a direct solver that is well suited for SMP nodes or modern multi-core architectures. Modern multi-processing architectures are commonly based on shared memory systems with a NUMA behavior. These computers are composed of several chip-sets including one or several cores associated to a memory bank. Such an architecture implies hierarchical memory access times from a given core to the different memory banks which do not exist on SMP nodes. Thus, the main data structure of our targeted application have been modified to be more suitable for NUMA architectures. We also introduce a simple way of dynamically schedule an application based on a dependency tree while taking into account NUMA effects. Results obtained with these modifications are illustrated by showing performances of the PaStiX solver on different platforms and matrices

    Numerical simulation of unsteady MHD flows and applications

    Get PDF
    International audienceWe present a robust numerical method for solving the compressible Ideal Magneto-Hydrodynamic equations. It is based on the Residual Distribution (RD) algorithms already successfully tested in many problems. We adapted the scheme to the multi-dimensional unsteady MHD model. The constraint ∇ · B = 0 is enforced by the use a Generalized Lagrange Multiplier (GLM) technique. First, we present this complete system and the keys to get its eigensystem, as we may need it in the algorithm. Next, we introduce the numerical scheme built in order to get a compressible, unsteady and implicit solver which has good shock-capturing properties and is second-order accurate at the converged state. To show the efficiency of our method, we will then comment some 2D results. We will end by pointing out some issues and the extensions we plan for this solver

    Nested dissection with balanced halo

    Get PDF
    International audienceNested Dissection has been introduced by A. George and is a well-known and very popular heuristic for sparse matrix ordering to reduce both the fill-in and the operation count during the numerical factorization. Considering now hybrid methods mixing both direct and iterative solvers, obtaining a domain decomposition leading to a good balancing of both the size of domain interiors and the size of interfaces is a key point for load balancing and efficiency in a parallel context. For this purpose, we revisit the algorithm introduced by Lipton, Rose and Tarjan which per- formed the recursion in a different manner

    Amélioration du comportement numérique des solveurs en prenant en compte les poids de la matrice lors de la décomposition de domaines

    Get PDF
    This work, performed within the PETALh project, relates our attempts to improve the numerical behaviour of the solvers developed in PETAL by adding some numerical information in the partitioning algorithm. The direction we are pursuing in this study is to use matrix coefficients during the domain decomposition.Ce rapport produit au sein de l'ANR PETALh, " Préconditionnement pour des applications scientifiques sur des machines petascale heterogenes ", présente nos tentatives d'améloration du comportement numérique des solveurs développés au sein de l'ANR PETAL en ajoutant des informations numériques en entrée des algorithmes de partitionnement. L'optique suivie pour cette étude est d'utiliser les poids de la matrice lors de la décomposition de domaines

    A NUMA Aware Scheduler for a Parallel Sparse Direct Solver

    Get PDF
    Over the past few years, parallel sparse direct solvers made significant progress and are now able to solve efficiently industrial three-dimensional problems with several millions of unknowns. To solve efficiently these problems, PaStiX and WSMP solvers for example, provide an hybrid MPI-thread implementation well suited for SMP nodes or multi-core architectures. It enables to drastically reduce the memory overhead of the factorization and improve the scalability of the algorithms. However, today's modern architectures introduce new hierarchical memory accesses that are not handle in these solvers. We present in this paper three improvements on PaStiX solver to improve the performance on modern architectures : memory allocation, communication overlap and dynamic scheduling and some results on numerical test cases will be presented to prove the efficiency of the approach on NUMA architectures

    Raffinement de maillage adaptatif pour la simulation numérique des instabilités MHD dans les tokamaks : le code JOREK

    Get PDF
    The purpose of this paper is to illustrate both validity and advantages of the implementation of the adaptive mesh raffinement strategy in the recent version of the 3D non-linear MHD code JOREK which uses a technique based on the bicubic Bezier surfaces developed in the paper of Czarny-Huijsmans. We describe the physcal model and establish a refinement criteria. Then, we also present the numerical results of adaptive mesh raffinement simulation for the a tearing instability test case and to the test case of injection mechanism of a small pellet of frozen hydrogen into a tokamak
    • …
    corecore